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DETAILED ACTION 

Response to Arguments 

1 . Applicant's arguments filed 10/06/10 have been fully considered but they are not 
persuasive. 

Applicant argues that neither Boys nor Yokota disclose or suggest initiating a 
backward jump, counter to the forward sequence over a distance corresponding to a 
length of at least N spoken words using the word boundaries indicated in the word- 
marking data, to a target position, and then, starting from the target position, the control 
means initiates a replay of K words of the audio data in the forward sequence using the 
word boundaries indicated in the word-marking data, wherein K is less than N 
(Amendment, page 10). 

The examiner disagrees, since Yokota discloses "review playback is performed 
program by program, but cue playback is performed within each program. Most 
specifically, first the aforementioned cue playback is performed from the 
beginning of the 5 th program and after completion of the 5 th program, the 
playback jumps from the last data position of the 5 th program to the beginning of 
the 4th program, and the cue playback of the 4 th program is performed... Thereafter 
the above playback operation is advanced similarly for the next and subsequent 
programs" (Performing cue playback in each program and jumping from the last data 
position of that program to the beginning of the next and subsequent program implies 
replaying of K words of the audio data in the forward sequence using the word 
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boundaries indicated in the word-marking data, since backward jumping is based on the 
last data position of the program; col. 12, lines 3 - 20). 

Applicant argues that neither Boys nor Yokota disclose or suggest generating 
word marking data, the word marking data indicating locations of word boundaries 
between spoken words within the audio data and linking words in the audio data to 
corresponding words in the text data (Amendment, page 10). 

The examiner disagrees, since Boys discloses "A user may speak a word or a 
phrase, and the system will rapidly search the document for a data string to match 
the digital print of the spoken phrase, moving the pointer to the beginning of a 
data string that matches. In a preferred embodiment input of machine-operable text 
code with the cursor in a voice region results in text being displayed in place of 
equivalent portions of the voice region (col.4, lines 34 - 38; col. 14, lines 17 - 22). 

2. Applicant's arguments with respect to claims 1 - 27 have been considered but are 
moot in view of the new ground(s) of rejection. 

Applicant argues that neither Boys nor Yokota disclose or suggest that the word- 
marking data is assigned by the voice recognition means to the start of each spoken 
word in the audio data (Amendment, page 10). 
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Claim Rejections - 35 USC § 103 

3. The text of those sections of Title 35, U.S. Code not included in this action can 
be found in a prior Office action. 

4. Claims 1 - 28 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Boys et al (US Patent 5,875,448) in view of Yokota et al., (EP 0597483); and further in 
view of Hanson (US PAP 2002/0062214). 

Regarding claims 1 and 8, Boys et al. discloses an arrangement for replaying 
stored audio data (see col. 3, line 50), the system comprising: 

voice recognition means for performing voice recognition ("voice-recognition") on 
the audio data and generating by the voice recognition means text data and word- 
marking data ("beginning of a data string"), the word-marking data indicating 
locations of word boundaries between spoken words within the audio data ("a data 
string to match the digital print of the spoken phrase, moving the pointer to the 
beginning of a data string that matches"; col. 2, lines 45 - 47; col. 6, line 66-col.7, 
line 1 ; col. 14, lines 17 - 22), and linking words in the audio data to corresponding words 
in the text data ("with the cursor in a voice region results in text being displayed in 
place of equivalent portions of the voice region"; col.4, lines 34 - 38); 

memory means for storing the audio data and for storing the text data and the 
word-marking data obtained from performing voice recognition on the audio data ("end 
of the file"; see col. 3, lines 48, 49; col.1 1 , lines 5-8; col. 6, lines 65 - 67; col.4, lines 
12; and 34-38); 
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display means for visually displaying the text data ("with the cursor in a voice 
region results in text being displayed in place of equivalent portions of the voice 
region"; col .4, lines 34 - 38). 

audio replaying means for replaying the audio acoustically in a forward 
sequence; the control means for controlling the replaying of stored audio data in a 
forward mode and in a reverse mode, the control means controlling the audio replaying 
means during a playback of the audio data in the reverse mode to perform a reverse 
mode playback operation including, starting from a replay position in the audio data ("a 
function called Return associated with Play moves the pointer immediately back to the 
position it held in the file at the beginning of the play function. The jog and Play 
functions are provided for a user to find positions in the file where additions, editing, or 
other functions are to be performed"col.13, lines 5-8, and 30 - 33; col.1 1 , lines 1 - 8); 

the control means controlling the displaying on the display means of the stored 
text data that corresponds to the audio data being replayed, as indicated by the word- 
marking data; the display means to automatically repeat performing the reverse mode 
playback operation while the system is in the reverse mode ("search the document for 
a data string to match the digital print of the spoken phrase, moving the pointer 
to the beginning of a data string that matches. In a preferred embodiment input of 
machine-operable text code with the cursor in a voice region results in text being 
displayed in place of equivalent portions of the voice region"; col. 4, lines 34 - 38; 
col. 14, lines 17 - 22). 
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However, Boys et al do not specifically teach the word-marking data is assigned 
by the voice recognition means to the start of each spoken word in the audio data; 
initiating a backward jump, counter to the forward sequence over a distance 
corresponding to a length of at least N spoken words using the word boundaries 
indicated in the word-marking data, to a target position, and then, starting from the 
target position, the control means initiates a replay of K words of the audio data in the 
forward sequence using the word boundaries indicated in the word-marking data, 
wherein K is less than N, the control means further controlling the audio replaying 
means. 

Yokota et al., teach that hybrid playback is a combination of fast playback 
operations in cue and review modes. In this example, review playback is performed 
program by program, but cue playback is performed within each program. Most 
specifically, first the aforementioned cue playback is performed from the 
beginning of the 5 th program and after completion of the 5 th program, the 
playback jumps from the last data position of the 5 th program to the beginning of 
the 4th program, and the cue playback of the 4 th program is performed... Thereafter 
the above playback operation is advanced similarly for the next and subsequent 
programs (col.1 2, lines 3 - 20). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to use hybrid playback as taught by Yokota et al., in Boys 
et al., because that would provide an improved disc playback method which is capable 
of performing fast playback (col.1 , lines 41 -44). 



Application/Control Number: 1 0/531 ,013 Page 7 

Art Unit: 2626 

However, Boys et al., in view Yokota et al., do not specifically teach the word- 
marking data is assigned by the voice recognition means to the start of each spoken 
word in the audio data. 

Hanson discloses a method for marking dictated text for deferred correction or 
review of dictated text in a speech recognition system proofreader, in accordance with 
the inventive arrangement, comprises the steps of: displaying previously dictated text; 
sequentially highlighting words in the text; selectively establishing a mark for different 
ones of the sequentially highlighted words responsive to user commands; and, storing 
the marks in an ordered list, each of the marks including a current position and length 
of a corresponding marked word, whereby the marked words can be later recalled for 
correction in accordance with the ordered list (paragraphs 7, and 8). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to assign word-marking to each spoken word as taught by 
Hanson in Boys et al., in view Yokota et al., because that would help obtain an 
improved method and apparatus for marking text for later review and possible correction 
or revision (paragraph 5). 

Regarding claims 2 and 9, Yokota et al., further disclose repeating the reverse 
playback operation causes each of the K words on each repetition of the playback 
operation to be replayed acoustically in the forward sequence and in order counter to 
the forward sequence ("Most specifically, first the aforementioned cue playback is 
performed from the beginning of the 5 ,h program and after completion of the 5 ,h 
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program, the playback jumps from the last data position of the 5 th program to the 
beginning of the 4th program, and the cue playback of the 4 th program is performed"; 
col. 12, lines 3 -20). 

Regarding claim 3, Boys et al. further disclose that a counting means is assigned 
to control means in order to count the marking data reached during backward jumping 
or replaying (see col. 1 1 , lines 1 -8). 

Regarding claim 4, Boys et al. further disclose that a timing circuit is assigned to 
control means in order to calculate the duration of the audio replay (see col. 1 1 , lines 
41-50). 

Regarding claim 5, Boys et al. further disclose that setting means is connected to 
control means in order to set the speed of the audio replay (see col. 1 1 , lines 41 -50). 

Regarding claims 6 and 15, Boys et al. further disclose that the control means is 
further connected to text memory means for storing text data corresponding to the audio 
data (see col. 7, lines 44-49), which is connected to text display means (see col. 7, lines 
26-29), and wherein the control means is set up to initiate, by means of linkage data for 
the audio data and text data, a synchronous replaying of the audio data and the text 
data corresponding to it (see col. 12, lines 30-41, lines 52-67). 



Application/Control Number: 1 0/531 ,013 Page 9 

Art Unit: 2626 

Regarding claim 7, Boys et al. further disclose that the control means and the 
text memory means and the memory means for the audio data are connected to voice 
recognition means, which undertakes an automatic transcription of the audio data to 
generate the text data ("converted the recorded areas to text"; see col. 16, lines 35-42). 

Regarding claim 10, Boys et al. further disclose that replaying in the forward 
sequence is automatically terminated when the next word-marking data is reached 
during replaying (see col. 13, lines 1-8). 

Regarding claim 1 1 , Boys et al. further disclose that replaying in the forward 
sequence is automatically terminated after a specified period (see col. 13, lines 1-8). 

Regarding claim 1 2, Boys et al. further disclose that termination of the replay in 
the forward sequence, a backward jump over a return distance corresponding to the 
length of at least roughly two words takes place automatically (see col. 13, lines 1-8). 

Regarding claim 1 3, Boys et al. et al. further disclose that the backward jump in 
the audio data is undertaken at a speed that is higher than the replay speed during 
replaying in the forward sequence, and without acoustic replaying of the stored audio 
data ("operates at faster than normal"; paragraph 12, lines 55 - 60). 
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Regarding claim 14, Boys et al. et al. further disclose that the replaying of the 
stored audio data in the forward sequence takes place at an adjustable replay speed 
(see col. 1 1 , lines 41-47). 

Regarding claim 1 6, Boys et al. et al. further disclose that during the visual 
displaying of multiple words of the text data, the particular visually displayed word for 
which the corresponding audio data is being replayed is visually highlighted (see col. 4, 
lines 51 -58, where the cursor highlights the word). 

Regarding claim 1 7, Boys et al. et al. further disclose that the text data 
corresponding to audio data is obtained by means of an automatic voice recognition of 
the audio data, wherein, simultaneously, the word-marking data is generated and stored 
as linkage data for the text data and audio data that correspond with each other 
("comparison can be made between the entered text and the voice-recorded" see col. 7, 
lines 36-50; col. 16, lines 35-48). 

Regarding claim 18, Boys et al. et al. further disclose that a computer program 
product that can be loaded into a memory of a computer, and which comprises sections 
of software code in order that, by means of their implementation following loading into 
the memory, the method as claimed in claim 8 can be implemented with the computer 
(see col. 16, lines 51-53). 
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Regarding claim 19, Boys et al. et al. further disclose that a computer program 
product as claimed in claim 18, characterized in that it is stored on a computer-readable 
medium (see col. 16, lines 51-53). 

Regarding claim 20, Boys et al. et al. further disclose that a computer with a 
processing unit and an internal memory, which computer is designed to implement the 
computer program product as claimed in claim 18 (see col. 16 lines 51-53). 

As per claim 21 , Boys et al., teach an arrangement for replaying stored audio 
data comprising: 

a voice recognition system configured to perform voice recognition on the audio 
data and to generate text data and word-marking data ("beginning of a data string"), 
the word-working data indicating locations of word boundaries between spoken words 
within the audio data ("a data string to match the digital print of the spoken phrase, 
moving the pointer to the beginning of a data string that matches"; col. 2, lines 45 
- 47; col. 6, line 66-col.7, line 1 ; col. 14, lines 17 - 22), and linking words in the audio 
data to corresponding words in the text data ("with the cursor in a voice region results 
in text being displayed in place of equivalent portions of the voice region "; col. 4, 
lines 34-38); 

a memory configured to store the audio data and to store the text data and the 
word-marking data obtained from performing voice recognition on the audio data ("end 
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of the file... location of the file"; see col. 3, lines 48, 49; col.1 1, lines 5-8; col. 6, lines 65 
-67; col.4, line 12); 

a display device configured to visually display the text data ("with the cursor in a 
voice region results in text being displayed in place of equivalent portions of the 
voice region"; col.4, lines 34 - 38); 

the controller further configured to display on the display device the text data that 
corresponds to the audio data being replayed, as indicated by the word-marking data 
("search the document for a data string to match the digital print of the spoken 
phrase, moving the pointer to the beginning of a data string that matches. In a 
preferred embodiment input of machine-operable text code with the cursor in a voice 
region results in text being displayed in place of equivalent portions of the voice 
region"; col.4, lines 34 - 38; col. 14, lines 17 - 22). 

Boys et al., do not specifically teach the word-marking data is assigned by the 
voice recognition means to the start of each spoken word in the audio data; a controller 
configured to playback the audio data in a reverse mode by jumping back N words 
using the word boundaries indicated in the word-marking data, playing back K words 
using the word boundaries indicated in the word-marking data, and then automatically 
repeating the jumping and playing back while in the reverse mode, wherein K is less 
than N. 

Yokota et al., teach that hybrid playback is a combination of fast playback 
operations in cue and review modes. In this example, review playback is performed 
program by program, but cue playback is performed within each program. Most 



Application/Control Number: 1 0/531 ,013 Page 1 3 

Art Unit: 2626 

specifically, first the aforementioned cue playback is performed from the 
beginning of the 5 th program and after completion of the 5 th program, the 
playback jumps from the last data position of the 5 th program to the beginning of 
the 4th program, and the cue playback of the 4 th program is performed... Thereafter 
the above playback operation is advanced similarly for the next and subsequent 
programs (Performing cue playback in each program and jumping from the last data 
position of that program to the beginning of the next and subsequent program implies 
replaying of K words of the audio data in the forward sequence using the word 
boundaries indicated in the word-marking data, since backward jumping is based on the 
last data position of the program; col. 12, lines 3 - 20). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to use hybrid playback as taught by Yokota et al., in Boys 
et al., because that would provide an improved disc playback method which is capable 
of performing fast playback (col.1 , lines 41 - 44). 

However, Boys et al., in view Yokota et al., do not specifically teach the word- 
marking data is assigned by the voice recognition means to the start of each spoken 
word in the audio data. 

Hanson discloses a method for marking dictated text for deferred correction or 
review of dictated text in a speech recognition system proofreader, in accordance with 
the inventive arrangement, comprises the steps of: displaying previously dictated text; 
sequentially highlighting words in the text; selectively establishing a mark for different 
ones of the sequentially highlighted words responsive to user commands; and, storing 
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the marks in an ordered list, each of the marks including a current position and length 
of a corresponding marked word, whereby the marked words can be later recalled for 
correction in accordance with the ordered list (paragraphs 7, and 8). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to assign word-marking to each spoken word as taught by 
Hanson in Boys et al., in view Yokota et al., because that would help obtain an 
improved method and apparatus for marking text for later review and possible correction 
or revision (paragraph 5). 

As per claim 22, Yokota et al., further suggest that N=2 and K=N-1 ("first the 
aforementioned cue playback is performed from the beginning of the 5 th program and 
after completion of the 5 ,h program, the playback jumps from the last data position of the 
5 th program to the beginning of the 4th program, and the cue playback of the 4 th 
program is performed"; col. 12, lines 3 - 20). 

As per claims 23, and 24, Yokota et al., further suggest that the controller is 
configured to skip playback of a number of the words so that only every fourth or fifth of 
the words is replayed; configured to skip playback of a number of the words so that only 
every predetermined number of the words is replayed ("skipping 8 sectors which 
correspond to four of a 2-sector unitary block"; col. 10, lines 42 - 48). 
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As per claim 25, Yokota et al., further disclose playing back is for a 
predetermined duration after which the automatically repeating the jumping and the 
playing back are performed ("first the aforementioned cue playback is performed from 
the beginning of the 5 th program and after completion of the 5 th program, the playback 
jumps from the last data position of the 5 th program to the beginning of the 4th program, 
and the cue playback of the 4 th program is performed"; col. 12, lines 3 - 20). 

As per claim 26, Yokota et al., further disclose that the jumping back is for a 
return distance which is one of as estimated mean data duration of the N words and 
determined from a word-marking data associated with the audio data ("the playback 
jumps from the last position of the 5 th program to the beginning of the 4th program" 
col. 12, lines 3 -20). 

As per claim 27, Yokota et al., further disclose the playing back is terminated in 
response to reaching one of a word-marking data associated with an end of the Kth 
word and a predetermined replay time ("cue playback is performed from the beginning 
of the 5 th program and after completion of the 5 1h program"; col. 12, lines 3 - 20). 



Regarding claim 28, Boys et al. et al. teach a memory device encoded with 
instructions that, when executed by a computer, perform the method of claim 8 (location 
of the file"; see col. 3, lines 48, 49; col.1 1 , lines 5-8; col. 6, lines 65 - 67; col. 4, line 1 2) 
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Conclusion 

5. Applicant's amendment necessitated the new ground(s) of rejection presented in 
this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP 
§ 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 
CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to LEONARD SAINT CYR whose telephone number is 
(571) 272-4247. The examiner can normally be reached on Mon- Friday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone 
number for the organization where this application or proceeding is assigned is (571)- 
273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or (571)-272-1000. 
LS 

12/14/10 

/Leonard Saint-Cyr/ 
Examiner, Art Unit 2626 



